Protein Engineering, Design and Selection — Latest Matching Preprints

1

Minimal Data, Maximal Insight (MDMI): A Structure-guided Pipeline for Discovering Functional Alternatives in Peptide-Protein Interfaces

Bayat, P.; Perkins, S. J.; Clancy, S.; Patel, S. S.; Yin, R. F.; Bozovicar, K.; Singh, S.; Shrestha, S.; Moustafa, Z.; Zayani, R.; IWE, I.; Bayat, S.; Kelly, P.; Vigar, J. R. J.; White, V. Y.; Xie, M.; Simchi, M.; Palter, S.; Nguyen, J.; Zeisler, I. Y.; Wu, B.; Pardee, K.

2026-07-14 synthetic biology 10.64898/2026.07.13.737974 medRxiv

Top 0.1%

2.3%

Show abstract

Discovering functional peptides across vast sequence space remains a formidable challenge, particularly when experimental training data is scarce. We present Minimal Data Maximal Insight (MDMI), a two-stage structure-guided computational pipeline that designs functional peptide variants using only a small, annotated dataset. Rather than relying on sequence information alone, MDMI integrates three-dimensional structural features derived from predicted peptide-protein complexes into a machine learning model that captures interface geometry and binding energetics. This structure-aware predictor, paired with a genetic algorithm for sequence exploration, reduced false positives from 70% to close to zero in an all-negative benchmark panel compared with a sequence-only model in computational benchmarking, and produced approximately four-fold more high-confidence in silico binders than state-of-the-art peptide/protein design baselines. Using the split-GFP system as a testbed, where fluorescence provides a direct functional readout of peptide-protein complementation, MDMI identified peptides with up to 38% sequence divergence from wild-type in Stage 1 while retaining measurable activity. In Stage 2, motif-guided recombination of successful Stage 1 variants produced highly divergent yet functional peptides bearing over 50% sequence difference from wild-type, revealing two distinct functional clusters in sequence space. As further validation, a top-performing candidate expressed as a full-length GFP fusion retained a GFP-like emission profile, supporting formation of a fluorescent GFP-like scaffold. These results demonstrate that structure-informed pipelines can uncover remote functional sequence space from minimal data, with broad implications for peptide and therapeutic analog discovery.

2

Scalable Production of a De Novo SARS-CoV-2 Antiviral miniprotein in Escherichia coli

Shin, J.; KIm, E.-m.; Jang, J.-h.; Jee, S.-w.; Kim, S.-h.; Yu, S.; Yoon, M.; Craig, D.; Swoyer, R.; Alamuri, P.; Price, A.; Patel, S.; Ravichandran, R.; Carter, L.; Pallerla, S.

2026-06-24 bioengineering 10.64898/2026.06.23.734092 medRxiv

Top 0.1%

2.1%

Show abstract

The rapid emergence of SARS-CoV-2 variants that evade neutralizing antibodies underscores the need for next-generation antiviral biologics that combine molecular precision with scalable, cost-effective manufacturing. Computationally designed miniproteins targeting the receptor-binding domain (RBD) of the spike protein offer a compelling alternative to monoclonal antibodies due to their small size, high thermal stability, and compatibility with microbial expression systems. Here we report the end-to-end development and cGMP production of IPD-52520, a de novo antiviral miniprotein, using an optimized E. coli platform. Two miniprotein candidates, a homotrimeric construct (Trimer is referred to as IPD-52520, 17 kDa) and a tandem fusion (Daisy is referred to as IPD-52521, 25 kDa), were evaluated in parallel through systematic optimization of strain selection, media composition, fed-batch fermentation, inclusion-body solubilization, refolding, and chromatographic purification. The Trimer was downselected as the lead molecule based on superior preclinical efficacy, favorable pharmacokinetic properties, and higher volumetric manufacturing yields. The optimized process delivers approximately 2 g/L of purified protein at greater than 90% purity. Scale-up from 5 L to 50 L under cGMP conditions demonstrated excellent batch-to-batch reproducibility across six independent batches, supporting nonclinical and Phase 1 clinical supply. Comprehensive biophysical characterization confirmed a well-folded, predominantly alpha-helical trimer (Tm = 73.4 {degrees}C; polydispersity = 1.005) with an intact primary structure and strong target-binding affinity (KD < 1 pM). Real-time stability studies indicate that the drug substance is stable at 2-8 {degrees}C for at least 12 months, with ongoing stability studies. These results demonstrate the feasibility of translating computationally designed antiviral miniproteins into manufacturable biologics and provide a platform applicable to rapid-response therapeutics against current and future pandemic threats.

3

Prediction-Guided Design of a More Developable FGF21 Construct

Bozkurt, C.; Nathanail, E.; Goteti, A.

2026-07-14 bioengineering 10.64898/2026.07.13.738140 medRxiv

Top 0.1%

1.8%

Show abstract

For structural-biology and protein-production pipelines, the hardest part of a difficult protein is not the biology -- it is obtaining a well-behaved sample for functional studies. Programs routinely stall at construct design, expression, and purification: deciding where to truncate, which tags to use, how to express, and how to purify so the protein survives concentration and handling. These decisions are still made largely by literature precedent and experimental experience, and they require trial-and-error before arriving at a functional construct for hard targets. We present a prospective, single-pair wet-lab case study testing whether an integrated computational platform can improve these decisions. For human fibroblast growth factor 21 (FGF21) -- a clinically important and stability-challenged metabolic hormone -- we compared two expression constructs produced side by side under the same experimental workflow, using two different design strategies: one designed by a scientist from the literature (reproducing the published core-domain construct, PDB 6M6E), and one designed by the Orbion platform -- an AI, prediction-guided protein-design system (orbion.life) -- which additionally generated the expression and purification protocols (executed scientist-in-the-loop). The platforms construct used an unconventional, longer C-terminal boundary not found in public sequence databases. Since the two constructs differ in more than one feature, we treat them as workflow-level designs throughout. The scientist construct gave a higher initial yield ([~]2.4 xmore protein recovered at affinity capture). The platform-designed construct, however, showed a more favourable downstream developability profile: it concentrated higher (1.4 vs 0.7 mg/mL) while remaining more monodisperse by dynamic light scattering (DLS). The scientist construct, in contrast, aggregated on concentration, so its initial-yield advantage did not survive: in the final concentrated sample the Orbion construct provided the more usable material for downstream studies. Computed for the mammalian host used, the platform had prospectively scored its own design higher (composite 68.7 vs 59.0 for the scientist-designed construct), and its predictions of yield, solubility, and disorder matched the wet-lab outcome. This is a single, deliberately scoped case study, not a population-level benchmark; the two constructs differ in more than one feature, and biological activity was not assayed. Alongside the bottlenecks of this approach discussed here, used as a decision aid, prediction-guided construct and protocol design has the potential to remove costly iteration cycles of protein production campaigns.

4

Structure-function studies of HRIKD-{triangleup}KI, a Minimal Kinase Domain of Human Heme-Regulated Inhibitor Kinase

Rajasekaran, M. B.; Booth, J.; Crepin, D. F.; Roe, S. M.; Zhou, L.; Gianga, T.-M.; Siligardi, G.; Gonzalez-Mendez, R.; Staikopoulou, M.; Hassan, H.; Oliver, A.; Mancini, E.; Spencer, J.

2026-07-07 biochemistry 10.64898/2026.07.06.735516 medRxiv

Top 0.1%

1.7%

Show abstract

EIF2alpha kinase heme-regulated inhibitor (HRI) is a novel target for haematological malignancies with modulators reported to trigger cell death via the HRI-eIF2alpha-ATF4 pathway. We report a protocol for producing the minimal kinase domain of full-length human HRI, termed HRIKD-delta-KI, where the unstructured 140 amino acid (aa) kinase insert (KI) within HRI kinase domain (HRIKD) is replaced with a 2aa glycine/serine (GS) linker. X-ray crystal structures were determined of apo-HRIKD-delta-KI and of its complex with ATP at 2.1 & 2.5 Angstrom resolution respectively. Both structures display a canonical bi-lobal kinase fold. However, they remain in a non-productive state with a displaced C-helix, disassembled R-spine, and a disordered activation segment hindering the substrate site. Biophysical assays (fluorescence based thermal shift & Synchrotron Radiation Circular Dichroism) demonstrate HRIKD-delta-KI retains its functional ligand-binding conformation. All together, these findings define structural and ligand-binding features of HRI to support ongoing drug discovery efforts in blood cancer.

5

Structure-guided computational design and mechanistic understanding of the p95HER2-targeting NAZ-mAb antibody and its variants

Rawat, P.; Kyte, J. A.; Greiff, V.; Dorraji, E.

2026-07-11 bioinformatics 10.64898/2026.07.07.736817 medRxiv

Top 0.1%

1.1%

Show abstract

Human epidermal growth factor receptor 2 (HER2) is an oncogenic receptor tyrosine kinase in breast cancer and other malignancies. A subset of HER2-positive tumours expresses 611-CTF-p95HER2, a tumour-specific, hyperactive truncated isoform associated with metastasis and treatment resistance that lacks most of the extracellular domain targeted by conventional HER2-directed antibodies. We previously developed NAZ-mAb (formerly known as Oslo-2), a monoclonal antibody against 611-CTF-p95HER2. Here, we describe a computational antibody-engineering workflow for designing variants of NAZ-mAb. Starting from the sequence alone, we modeled the NAZ-mAb-611-CTF-p95HER2 complex, generated a combinatorial mutational landscape using FoldX 5.0, and prioritized candidate variants using predicted interaction energy and developability criteria. Two variants representing distinct design strategies were selected for validation: an aromatic double mutant, NAZ-mAb v1 (L:S31W/L:H107W), and a conservative single mutant, NAZ-mAb v2 (L:S31M). Both variants were successfully expressed as recombinant IgGs; NAZ-mAb v2 achieved a five-fold higher recombinant expression yield than parental NAZ-mAb, while both variants retained antigen binding with a higher apparent signal than the parental antibody in indirect ELISA. However, Biacore two-state kinetic analysis revealed weaker affinities than the parental antibody (KD NAZ-mAb v1: 32.6 nM, NAZ-mAb v2: 9.45 nM vs. parental NAZ-mAb: 5.33 nM). These findings show that the computational workflow can generate experimentally tractable, antigen-engaging NAZ-mAb variants, while also highlighting the limitations of fixed-backbone interaction-energy ranking as a predictor of binding affinity and yield. This study provides a practical framework for computationally driven, developability-aware antibody optimization in the absence of experimental structural data.

6

ComplexDesign: sequence-hallucination design of protein binders bridging multiple proteins

Xu, J.; Ren, M.; Qi, N.; Zhang, X.; He, Z.; Yu, C.; Bu, D.

2026-06-24 bioinformatics 10.64898/2026.06.21.733655 medRxiv

Top 0.2%

1.0%

Show abstract

MotivationDesigning multichain protein complexes requires coordinating the folding of component proteins with the formation of their interfaces. The existing methods, however, remain limited in their ability to satisfy these requirements simultaneously, especially for trimeric and tetrameric complexes. As an important practical scenario, designing a binder that bridges two target proteins into a ternary complex requires flexibility in the relative arrangement of the two targets, adding an additional challenge to existing design methods. ResultsWe present ComplexDesign, a hallucination-based approach for multichain protein design. ComplexDesign performs structure-prediction-guided sequence optimization to simultaneously fold each protein chain and form inter-chain interactions that bind them together. To provide the flexibility required to appropriately arrange these target proteins, ComplexDesign introduces a specialized masking mechanism that enables exploration of possible relative arrangements rather than being limited to the predefined ones. Across a comprehensive set of benchmarks with various chain lengths, ComplexDesign outperformed existing methods in the unconditional design of dimers, trimers, and tetramers, achieving a high design success rate exceeding 50%, supporting its capability for multichain complex design. Furthermore, in the case of multi-target binder design, ComplexDesign produced high-confidence, self-consistent ternary complexes for 8 out of 10 target pairs. These results establish ComplexDesign as an effective tool for multichain protein design, with particular utility for designing binders that bridge two target proteins. Availability and implementationThe source code of ComplexDesign will be made publicly available upon publication.

7

Prosculpt: Lowering the Barrier to Computational Protein Design

Olivieri, F.;Konstantinova, A.;Ribnikar, N.;Bizjak, N.;Žnidar, ?.;Abel, K.;Rajh, E.;Ljubetič, A.

2026-06-26 Synthetic Biology 10.64898/2026.06.25.732351 medRxiv

Top 0.2%

0.8%

Show abstract

Over the past decade, protein design has evolved from a specialized discipline into a broadly accessible approach for engineering and interrogating biological systems. Despite these advances, protein design continues to be a technically challenging task, often requiring knowledge of programming to be able to use and combine the different software packages. To address this challenge, we have developed Prosculpt, an easy-to-use protein design pipeline. Prosculpt integrates RFdiffusion for backbone generation, ProteinMPNN for sequence design and multiple structure-prediction platforms (AF2, AF3, Colabfold, Boltz2). Candidate designs are evaluated using customizable Rosetta-based scoring protocols. Each project is specified through a single configuration file, enabling users with minimal computational expertise to perform sophisticated protein design tasks without writing code, while also allowing advanced users to access the full capabilities of the underlying programs. Prosculpt supports a wide range of applications, including design of symmetric homo-oligomers, design of binders, motif scaffolding, partial diffusion and fixed-backbone sequence redesign. By combining these capabilities within a single, user-friendly platform, Prosculpt provides a practical entry point to modern protein design for both novice and expert users.

8

Hot Pursuit: Bioinformatic and Biochemical Characterization of a Hyperthermophilic Family B DNA Polymerase from Pyrolobus fumarii A1

Rusinek, W.; Dorawa, S.; Kaczorowski, T.

2026-06-26 biochemistry 10.64898/2026.06.25.734501 medRxiv

Top 0.2%

0.6%

Show abstract

Thermostable DNA polymerases are indispensable tools in molecular biology, yet enzymes from the most extreme hyperthermophiles remain largely uncharacterized. Here, we report the biochemical and structural characterization of a family B DNA polymerase from Pyrolobus fumarii A1 (Pyrfu pol), one of the most thermoresistant archaea described to date. The enzyme was efficiently overproduced in E. coli Rosetta 2(DE3)[pLysS] and purified to homogeneity using a two-step protocol that combined heat treatment with immobilized metal affinity chromatography (IMAC). Bioinformatic analysis confirmed the canonical family B architecture, while AlphaFold-based structural modeling and comparative analysis with mesophilic RB69 DNA polymerase revealed a well-conserved structural core alongside thermoadaptive features. Radiolabel incorporation assays demonstrated enzymatic activity over a broad ionic strength range and an absolute requirement for Mg ions. PCR-based optimization confirmed these findings and revealed broad pH tolerance (6.5-11.0). Notably, Tris inhibited radiolabel-based assays (pH 7.0) yet proved essential for efficient PCR amplification (pH 8.5), suggesting a context-dependent role of buffer composition in polymerase activity. Processivity assays confirmed amplification of DNA fragments up to approximately 8,000 bp. Replication fidelity, assessed by the lacZ-based assay, showed a 2.9-fold improvement over Taq polymerase. Urea-nanoDSF yielded an exceptional melting temperature of 105.9 {+/-} 0.08 {degrees}C. Pyrfu pol also demonstrated tolerance to common PCR inhibitors, highlighting its potential utility in molecular biology applications.

9

Benchmarking AlphaFold and related deep learning approaches for modeling antibody and TCR antigen recognition

Yin, R.; Saravanakumar, S.; Shi, S. Y.; Park, M.; Lin, V.; Lee, J.; Cheung, M.; Felbinger, N.; Kaufman, S.; Eisenberg, M.; Pierce, B.

2026-07-06 bioinformatics 10.64898/2026.07.04.736425 medRxiv

Top 0.4%

0.5%

Show abstract

Determining the structural basis of antigen recognition by antibodies and T cell receptors (TCRs) provides critical insights into effective immune targeting and can inform design of biotherapeutics and vaccines. Accurate computational modeling of antibodies and TCRs in complex with their targets poses a major challenge for predictive methods, including AlphaFold, which is generally accurate for modeling protein complexes but has shown limited success for immune recognition. In this study we assessed the performance of AlphaFold2, AlphaFold3, increased sampling protocols, and related deep learning methods for modeling antibody-protein, antibody-peptide, and TCR-peptide-major histocompatibility complex (pMHC) recognition. We show that increased sampling and AlphaFold3 generally improve performance relative to default sampling and AlphaFold2, however predictive accuracy and improvement levels varied considerably among interface classes, with antibody-peptide complexes representing a challenge despite their small antigen size. Comparing per-case success across methods showed some complementarity, indicating opportunities for increased success through model pooling approaches, for instance increasing antibody-peptide near-native success from 41% to 59%. Analysis of AlphaFold confidence scores and modeling of a noncanonical complex provided further insights into predictive performance. These results highlight considerations for predictive antibody and TCR complex modeling efforts, while revealing key distinctions among protocols, scoring, and immune complex classes.

10

CD117 epitope-shielded hematopoietic stem cell transplantation with toxin-free conditioning and in vivo selection ameliorates β-thalassemia model

Marone, R.; Lepore, R.; Paschoudi, K.; Zuin, J.; Sinopoli, A.; Camus, A.; Burgold, T.; Bartoszek, E.; Calabrese, D.; Toranelli, M.; Wittwer, J.; Rhiel, M.; Andrieux, G.; Li, C.; Hsu, A.; Wiederkehr, A.; Wellinger, L. C.; Grossjohann, E.-M.; Ten Buren, E.; Brault, J.; Garcia Prat, L.; Lehmann, F.; Do Sacramento, V.; Christopher Divsalar, C.; Yumlu, S.; Liu, D. R.; Lieber, A.; Cathomen, T.; Cornu, T. I.; Yannaki, E.; Stefanie Urlinger, S.; Jeker, L. T.

2026-07-08 bioengineering 10.64898/2026.07.07.736903 medRxiv

Top 0.4%

0.5%

Show abstract

Clinical evidence demonstrates that ex vivo gene therapy and genome engineering of hematopoietic stem and progenitor cells (HSPCs) could represent one-time cures. However, while genome editing itself has become increasingly efficient and precise, the toxic conditioning required for hematopoietic stem cell transplantation remains a major barrier to broad clinical implementation of these otherwise curative therapies. In particular, the use of busulfan for myeloablative conditioning constitutes a major safety concern. While preclinical studies established CD117 as a promising target for antigen-specific therapy, clinical translation faced setbacks balancing efficacy and safety. To overcome current limitations, we generated a new CD117-blocking monoclonal antibody (CIM058) and demonstrate its potency to block wild-type HSPCs. To enable long-term blockade of host HSPCs even after transplantation, we used prime editing to engineer CIM058-resistant human CD34+ HSPCs. When combined, CIM058 and the epitope engineered CD34+ HSPCs ameliorated disease phenotype in a {beta}-thalassemia model. Our results suggest that this approach may overcome the reliance on busulfan or other myeloablative conditioning regimens with their associated morbidities, and by enabling toxin-free conditioning and in vivo selection of edited cells, may facilitate clinical implementation of these highly valuable genetic therapies.

11

Identifying and Addressing Systematic Data Leakage in Protein-Ligand Affinity Benchmarks

Mattsson, B.;Walters, W.

2026-06-30 Molecular Biology 10.64898/2026.06.29.735309 medRxiv

Top 0.4%

0.4%

Show abstract

Accurate prediction of protein-ligand binding affinity is a crucial goal in structure-based drug discovery, with the potential to significantly shorten development timelines. Recently, a new wave of machine learning models based on co-folding, such as Boltz-2 and IsoDDE, has demonstrated performance that matches or exceeds that of gold-standard physics-based methods like Free Energy Perturbation (FEP). This paper provides a critical assessment of these claims, revealing that current benchmarks are heavily influenced by data leakage, and proposes a new benchmark that explicitly controls for data leakage. We demonstrate that splitting by protein-sequence identity is inherently insufficient to prevent data leakage due to "target mirroring," in which homologous proteins with low overall sequence identity still exhibit highly correlated binding profiles. Our meta-analysis of documents in the ChEMBL 36 database identifies more than 6,000 such assay pairs and finds that leakage persists for sequence-identity thresholds as low as 0.2, well below the values commonly used in benchmarks today. Additionally, we show that a ligand-only baseline model, which lacks protein structural information, achieves surprisingly high performance on the FEP+ 4 and OpenFE benchmarks (r = 0.66 and r = 0.36, respectively). Our results indicate that current benchmarks tend to reward models for memorizing training data and exploiting localized leakage rather than truly learning biophysical principles. To address this issue, we propose the Novelty-Tiered Affinity Benchmark, in which the test data is partitioned into ligand novelty tiers. In the most challenging tier (Tanimoto similarity < 0.35), ligand-only models perform notably worse (r = 0.14), offering a clear baseline for evaluating genuine generalization. We argue that the field must move beyond sequence-based splits to ensure that AI-driven discovery translates into successful prospective laboratory research.

12

Acquiring Improved Protein Variants With Probabilistic Preferential Learning

van der Flier, F. J.; de Ridder, D.; Probst, D.; Redestig, H.

2026-06-26 bioinformatics 10.64898/2026.06.22.733688 medRxiv

Top 0.4%

0.4%

Show abstract

Variant effect prediction (VEP) models can be used to select promising novel enzymes from a pool of candidates. Most supervised VEP models are framed as regression tasks, placing more emphasis on getting the predicted quantities correct than on the relative comparison of individual candidates. Preferential or contrastive models may better align with the goal of selection, or acquisition, especially when informed by predictive uncertainty. Here, we introduce a probabilistic preferential learning model based on the Kermut Gaussian process (PKermut) that we designed with the ambition to increase the hit rate among selected variants. We benchmark PKermut against established models, including the original Kermut, the RITA regressor, and an augmented Potts model, on 69 curated ProteinGym datasets across various assay categories. To evaluate acquisition performance, we propose a novel quantile cross-validation scheme that ensures the evaluation of a models ability to extrapolate by reserving high-performing variants exclusively for the test set. We assess models using Spearman correlation and evaluate their acquisition performance using five different acquisition functions, encompassing both uncertainty-aware and unaware strategies. Our experimental results indicate that uncertainty estimates improve the acquisition ability of our models, and that strategies that reward uncertainty generally result in better outcomes than those that do not on single-mutation variant datasets. We observe that PKermuts Spearman scores and ability to acquire improved variants are greatly affected by the number of variant comparisons sampled in the training set. Kermut achieves the highest Spearman correlation in 54/69 datasets (78%), compared to 12/69 (17%) for PKermut. For acquisition performance, Kermut leads in 44/69 datasets (64%), while PKermut leads in 15/69 (22%). While at this stage PKermut is not a recommended alternative to Kermut, its contrastive nature offers several conceptual opportunities. We share our findings to inspire further development aimed at improving the alignment between training objectives of VEP models and their downstream application in protein engineering.

13

Function-guided design of active enzymes

Hu, M.; Wu, L.; Yang, Y.; Li, F.; Zhu, L.

2026-06-29 bioinformatics 10.64898/2026.06.27.735025 medRxiv

Top 0.5%

0.4%

Show abstract

Designing enzymes from functional descriptions remains challenging because catalytic activity is governed by sequence-structure-function relationships. Here we present EnzymeArt, a function-conditioned enzyme-design framework centred on a generative sequence model. EnzymeArt couples function-conditioned sequence generation with structure-guided refinement, annotation checks and substrate-aware computational prioritization to select candidates for synthesis and biochemical testing. Across alcohol dehydrogenase (ADH), malate dehydrogenase (MDH) and triacylglycerol lipase design campaigns, 57 of 60 synthesized designs showed crude-lysate activity above matched background controls. Purified representatives further showed quantitative steady-state catalytic activity. The best designed ADH reached kcat = 223.7/s and exceeded a wild-type reference under matched conditions, an MDH reached kcat = 267.57/s despite having only 33% sequence identity to its closest BLASTP hit, and a designed lipase hydrolysed both short- and long-chain triglycerides with apparent activity modestly above that of a commercial lipase reference. Together, these results establish a route for converting functional descriptions into experimentally validated enzyme designs with quantitative steady-state kinetic activity.

14

FCRL5 is a fucose-sensitive IgG-Fc receptor with binding properties distinct from classical Fcγ receptors

van der Hoeven, N.; Holborough-Kerkvliet, M. D.; Bao, Y.; Bentlage, A. E.; de Heer-Ooijevaar, P.; Derksen, N. I.; Damelang, T.; de Kreuk, B.-J.; Labrijn, A. F.; Vidarsson, G.; Rispens, T.

2026-07-07 immunology 10.64898/2026.07.01.735886 medRxiv

Top 0.5%

0.4%

Show abstract

Fc receptor-like protein 5 (FCRL5) is a low-affinity IgG receptor expressed on B cells, with emerging therapeutic relevance due to its expression on multiple myeloma cells, and a potential role in regulating B cell responses. Previous reports on the FCRL5-IgG interaction vary widely in reported affinities, binding differences across IgG subclasses, and molecular requirements for maximal binding. Furthermore, the impact of Fc-engineering strategies, as used in (therapeutic) monoclonal antibodies, remains poorly understood. Here, we provide a comprehensive biochemical analysis of the FCRL5-IgG interaction. We demonstrate that FCRL5 is a true IgG Fc-receptor, binding with very low affinity (60-80 M). FCRL5 binds IgG in a manner involving primarily the two N-terminal domains of FCRL5, and the third domain for maximal binding, but with distinct essential residues in the IgG Fc-tail. Surface plasmon resonance analysis of the binding of FCRL5 to the various IgG subclasses revealed a preference for IgG1 and IgG4. Interestingly, various Fc-engineered IgG variants commonly used for silencing or enhancing of Fc receptor binding do not impact FCRL5 binding. Screening the binding of a set of IgG antibodies carrying defined sets of Fc-mutations to FCRL5 revealed E293 as a key binding determinant and led to the discovery of E293R as a mutation that selectively abrogates FCRL5 binding while preserving binding to other classical Fc{gamma}Rs. Lastly, we show that FCRL5 has considerable preference for binding afucosylated IgG. Together, our results define the essential characteristics of the IgG-FCRL5 interaction and demonstrate the potential of both naturally occurring IgG variants as well as therapeutically explored bioengineered IgG formats to differentially engage FCRL5.

15

Capabilities, specificity gaps and training-data dependence of AlphaFold3 across diverse application areas

Follonier, O.; Liu, Y.; Campomanes, P.; Lafrenaye, L.; Racle, J.; Alvarez, D.; van Gerwen, J.; Heinzmann, R.; Jänes, J.; Kummelstedt, E.; Durairaj, J.; Gfeller, D.; Vanni, S.; Beltrao, P.

2026-07-13 bioinformatics 10.64898/2026.07.13.738147 medRxiv

Top 0.5%

0.4%

Show abstract

Structure prediction models have moved from single proteins to assemblies that include diverse biomolecules and their modifications. AlphaFold3 (AF3) and related models extended structural modelling via an all-atom framework, opening many new potential applications in structural biology. We evaluate how well the new capabilities of AF3 translate into application tasks in diverse areas: prediction of ubiquitinated protein structures, T-cell receptor (TCR)-epitope recognition, antibody-antigen complexes, protein-RNA and protein-lipid interactions. We find that, while AF3 can perform well in favourable settings, this performance is uneven across applications. In RNA-target predictions, the model confidence fails to separate genuine from decoy interaction partners and in several tasks accuracy depends on the presence of related complexes in the training set. Taken together, our assessment is more cautious than for AF2, whose gains in modelling monomers and complexes were clear and broadly generalisable. AF3s extension to new biomolecule types shows less consistent performance and generalisation. AF3 can be a powerful tool for hypothesis generation and prioritisation, but its predictions and use of confidence metrics will depend strongly on the specific application area and must be interpreted with respect to training-set overlap. We expect that the benchmarks provided here will serve for testing of future developments in the structure prediction field.

16

Engineering Functional CLA-Targeting CAR Approaches for Pancreatic Ductal Adenocarcinoma

Dourlens, C.; Vanderliek, K.; Geiger, L.; Burzan, N.; Tomiuk, S.; Droste, M.; Felsberger, A.; Hubrich, H.; Winkler, J.; Hardt, O.; Schaefer, D.

2026-07-09 immunology 10.64898/2026.07.03.736395 medRxiv

Top 0.6%

0.3%

Show abstract

Pancreatic cancer remains a highly lethal malignancy with limited therapeutic options. Chimeric antigen receptor (CAR) therapy has revolutionized the treatment of hematological cancers but still faces major limitations in solid tumors, particularly due to the scarcity of tumor-specific targets. Cutaneous lymphocyte antigen (CLA) recently emerged as a promising PDAC target due to its high tumor expression and limited presence in healthy tissues. However, previously reported CLA-directed CAR constructs lacked antitumor functionality. Here, we investigated multiple strategies to generate functional CLA-targeting CAR approaches. We first hypothesized that impaired activity resulted from fratricide caused by CLA expression on activated T cells. CLA knockout was successfully achieved through deletion of fucosyltransferase-7, but not by knockout of the major CLA carrier backbones CD162, CD44 or CD43, suggesting additional CLA carriers or compensatory regulation. As CLA knockout alone did not restore CAR-mediated killing, we explored whether insufficient binding affinity limited CAR activity. Affinity maturation was performed in silico and in vitro using yeast surface display, identifying 39 candidate mutations, although none restored cytotoxicity. We finally switched to an AdCAR strategy using anti-biotin CAR T cells combined with biotinylated anti-CLA scFv-Fc adapters. This approach enabled efficient, concentration-dependent cytotoxicity with both CLA-targeting binders. Additionally, we identified a dynamic, cell density-dependent regulation of CLA expression. Finally, glycan profiling of CLA binders further revealed broader-than-expected glycan interactions, suggesting a potentially wider definition of the CLA family. Overall, our findings establish CLA as a functional PDAC immunotherapy target while revealing unexpected complexity in its regulation and molecular presentation.

17

TCR-FramePose: a local-frame representation for decomposing global docking and CDR3 loop geometry in TCR-pMHC recognition

Kim, K. H.; Jiang, X.; Ye, Q.; Mohanty, V.; Dede, M.; Reuben, A.; Chen, K.

2026-07-04 bioinformatics 10.64898/2026.06.30.735664 medRxiv

Top 0.6%

0.3%

Show abstract

T cell receptor recognition of peptide-MHC depends on sequence, interface chemistry, and three-dimensional geometry, but docking geometry is often summarized at the whole-receptor level, leaving CDR3-local pose difficult to compare across structures. We introduce TCR-FramePose, a local-frame descriptor set that represents each TCR-pMHC complex as three bodies - whole TCR, CDR3a, and CDR3b - measured relative to a pMHC groove frame. For each body, FramePose decomposes the native pose into reach, offset direction on S^2, and orientation on SO(3); for tangent-space analyses, these components are mapped to six coordinates per body and 18 coordinates per complex. Applied to 378 curated abTCR-pMHC crystal structures, FramePose recovers known class-associated receptor-placement differences and additionally resolved whole-TCR and CDR3b orientation shifts that were not captured by crossing angle. The same orientation coordinates identified reverse-polarity and off-axis outliers as distinct modes. In cross-validated association analyses, FramePose added nonredundant BSA- and affinity-associated information beyond conventional descriptors, and the modest affinity gain was concentrated in CDR3 orientation blocks which were least recoverable from conventional descriptors. Biological grouping analyses showed that shared receptor pose over peptide-MHC was organized primarily by germline V-region framework. TCRs recognizing the same peptide-MHC target favors shared FramePose geometries rather than strong receptor-specific divergence, whereas CDR3 sequence did not detectably reposition the rigid-body pose after antigen context and germline framework were fixed. MHC allele and peptide length contributed smaller adjustments, localized mainly to CDR3b and groove-normal orientation axes. Finally, interface analyses showed that affinity tracked interface burial, with CDR3b reach linking FramePose geometry to binding through buried surface area. Within engineered panels, mutation-level effects were panel-specific, with CDR3b remodeling localizing to a recurrent interface region but varying in direction across receptors. These properties enable FramePose to serve as a geometric filter for in silico TCR-pMHC models and as a feature layer for structure-guided TCR engineering. Together, TCR-FramePose provides a nonredundant geometric layer for structure-guided TCR-pMHC analysis, linking germline-scaffolded recognition, CDR3-local pose, and interface organization without replacing sequence, contact, or energetic descriptors.

18

Machine learning guided cell-free expression maps the biochemical landscape of carbonic anhydrase

Lazar, J. T.; Komp, E.; Martinez, I.; Zolkin, K.; Notin, P. M.; Saleh, S.; Landwehr, G.; Kim, K.; Tian, A.; Shapero, B.; Karim, A. S.; Marks, D.; Beckham, G. T.; Jewett, M. C.

2026-07-08 synthetic biology 10.64898/2026.07.07.736810 medRxiv

Top 0.6%

0.3%

Show abstract

Carbonic anhydrases are among the fastest known biocatalysts, reversibly facilitating the hydration of CO2 to HCO3- at rates up to 107 s-1, which warrants their investigation for industrial carbon capture technologies. However, engineering carbonic anhydrases to maintain stability under harsh industrial process conditions remains a key challenge, and sequence-to-function datasets compatible with machine learning to inform forward engineering are lacking. Here, we developed a high-throughput platform that couples cell-free gene expression with a gaseous CO2 colorimetric assay to map the fitness landscapes of carbonic anhydrases. From 96 diverse natural homologs, we identified a robust variant from the Aquificota phylum and conducted an exhaustive mutational scan and functional assessment of this enzyme at 70C and 90C, covering >99% of all single-amino acid substitutions (totaling 4,365 mutations assayed in 39,285 reactions). This biochemical landscape was used to benchmark 22 zero-shot protein fitness models and identify critical mutations that improved enzyme stability at 90C by more than three-fold. We then used both zero-shot protein language models and supervised learning to filter 419 model-generated variants from a ProteinMPNN library of 100,000 sequences, leading to a best-in-class enzyme that retained activity after incubation at 95C. This work demonstrates that integrating cell-free enzyme engineering with machine learning enables opportunities for high-throughput experimental measurements to benchmark and improve protein language models, accelerate design loops, and expand functional exploration within protein families where experimental information is limited.

19

Barcoded-Plasmid DNA library construction for recording cell lineage trees enabled by a Scalable and modular Biofoundry-based Automated Robotic Pipeline

Tassinari, E.; Ives, L.; Hawkins, E.; Annese, D.; Fonseca, S.; Lan, Y.; Haerty, W.; Wojtowicz, E.; Grandellis, C.

2026-07-08 synthetic biology 10.64898/2026.07.07.736956 medRxiv

Top 0.6%

0.3%

Show abstract

High-quality plasmid DNA purification at high throughput remains a significant bottleneck in molecular biology and bioengineering. Current methods frequently fail to deliver sufficient yields of pure, transfection-grade DNA required for genetic engineering applications in mammalian cells. Here, we present a Biofoundry-based automated pipeline using the CyBio FeliX robotic liquid handling platform to rapidly purify plasmid DNA with minimal manual intervention. The protocol leverages Solid Phase Reversible Immobilisation (SPRI)-based magnetic bead technology to ensure consistency, scalability, and DNA purity suitable for downstream viral particle production and mammalian cell transfection. The pipeline supports flexible processing of between 8 and 96 samples per run, making it adaptable across a wide range of experimental scales. The protocol is openly available via Earlham Institute GitHub repository, enabling broad adoption across the bioscientific community and contributing to the growing toolkit of reproducible, scalable engineering biology workflows. In this work, we employed an integrated robotic pipeline to process 528 pooled DNA plasmids and built a Lentiviral DNA plasmid library for lineage tracing, validated the library by sequencing, and demonstrated efficacy in downstream mammalian cell transfection experiments.

20

Development and Characterisation of a Versatile Single-Domain Antibody Specific for M1-linked Ubiquitin Chains

Koch, J.; Bhark, S.-J.; Bader, V.; Fiil, B. K.; Lopez-Mendez, B.; Rasthoej, J. B.; Priesmann, D.; Mejias-Gomez, O.; Braghetto, M.; Montoya, G.; Gyrd-Hansen, M.; Winklhofer, K. F.; Goletz, S.; Damgaard, R. B.

2026-07-06 biochemistry 10.64898/2026.07.05.736589 medRxiv

Top 0.6%

0.3%

Show abstract

Ubiquitin signalling is mediated by structurally distinct polyubiquitin chains that encode discrete cellular functions. Progress in deciphering this ubiquitin code, particularly for the less abundant atypical chain types, has been hindered by limited availability of versatile chain type-specific affinity reagents. Here, we demonstrate that human single-domain antibodies (sdAbs) provide a versatile scaffold for the generation of ubiquitin linkage-specific binders. Using phage display and synthetic human sdAb libraries, we identified 2A6, an sdAb that specifically recognises methionine-1 (M1)-linked ubiquitin chains. To our knowledge, 2A6 represents the first reported sdAb with specificity for a defined homotypic ubiquitin chain linkage. 2A6 bound M1-linked ubiquitin chains with nanomolar affinity and was specific for M1-linked chains at the level of both diubiquitin and long polyubiquitin chains. AlphaFold3 modelling, supported by saturation mutagenesis, predicted that 2A6 recognises the proximal and distal ubiquitin moieties together with the region near the M1 linkage. Functionally, 2A6 enabled specific detection and enrichment of M1-linked ubiquitin across multiple applications, including ELISA, immunoblotting, immunoprecipitation under semi-denaturing conditions, substrate ubiquitination analysis, and immunofluorescence microscopy. The sdAb can be readily produced in E. coli from a single expression plasmid, providing a tractable, cost-effective and versatile reagent for investigating M1-linked ubiquitin signalling. Our work establishes sdAbs as a versatile scaffold for ubiquitin linkage-specific affinity reagents, providing a framework for the development of analogous binders specifically targeting additional ubiquitin linkages or architectures.